Goto

Collaborating Authors

 rheumatoid arthritis


Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning

Zhang, Ruihao, chen, Mao, Ye, Fei, Meng, Dandan, Huang, Yixuan, Liu, Xiao

arXiv.org Artificial Intelligence

Abstract--T cell receptor (TCR) repertoires encode critical immunological signatures for autoimmune diseases, yet their clinical application remains limited by sequence sparsity and low witness rates. We developed EAMil, a multi-instance deep learning framework that leverages TCR sequencing data to diagnose systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) with exceptional accuracy. By integrating Prime-Seq feature extraction with ESMonehot encoding and enhanced gate attention mechanisms, our model achieved state-of-the-art performance with AUCs of 98.95% for SLE and 97.76% for RA. EAMIL successfully identified disease-associated genes with over 90% concordance with established differential analyses and effectively distinguished disease-specific TCR genes. The model demonstrated robustness in classifying multiple disease categories, utilizing the SLEDAI score to stratify SLE patients by disease severity as well as to diagnose the site of damage in SLE patients, and effectively controlling for confounding factors such as age and gender . This interpretable framework for immune receptor analysis provides new insights for autoimmune disease detection and classification with broad potential clinical applications across immune-mediated conditions.




Automated Radiographic Total Sharp Score (ARTSS) in Rheumatoid Arthritis: A Solution to Reduce Inter-Intra Reader Variation and Enhancing Clinical Practice

Moradmand, Hajar, Ren, Lei

arXiv.org Artificial Intelligence

Assessing the severity of rheumatoid arthritis (RA) using the Total Sharp/Van Der Heijde Score (TSS) is crucial, but manual scoring is often time-consuming and subjective. This study introduces an Automated Radiographic Sharp Scoring (ARTSS) framework that leverages deep learning to analyze full-hand X-ray images, aiming to reduce inter- and intra-observer variability. The research uniquely accommodates patients with joint disappearance and variable-length image sequences. We developed ARTSS using data from 970 patients, structured into four stages: I) Image pre-processing and re-orientation using ResNet50, II) Hand segmentation using UNet.3, III) Joint identification using YOLOv7, and IV) TSS prediction using models such as VGG16, VGG19, ResNet50, DenseNet201, EfficientNetB0, and Vision Transformer (ViT). We evaluated model performance with Intersection over Union (IoU), Mean Average Precision (MAP), mean absolute error (MAE), Root Mean Squared Error (RMSE), and Huber loss. The average TSS from two radiologists was used as the ground truth. Model training employed 3-fold cross-validation, with each fold consisting of 452 training and 227 validation samples, and external testing included 291 unseen subjects. Our joint identification model achieved 99% accuracy. The best-performing model, ViT, achieved a notably low Huber loss of 0.87 for TSS prediction. Our results demonstrate the potential of deep learning to automate RA scoring, which can significantly enhance clinical practice. Our approach addresses the challenge of joint disappearance and variable joint numbers, offers timesaving benefits, reduces inter- and intra-reader variability, improves radiologist accuracy, and aids rheumatologists in making more informed decisions.


Evaluating the Unseen Capabilities: How Many Theorems Do LLMs Know?

Li, Xiang, Xin, Jiayi, Long, Qi, Su, Weijie J.

arXiv.org Artificial Intelligence

Accurate evaluation of large language models (LLMs) is crucial for understanding their capabilities and guiding their development. However, current evaluations often inconsistently reflect the actual capacities of these models. In this paper, we demonstrate that one of many contributing factors to this \textit{evaluation crisis} is the oversight of unseen knowledge -- information encoded by LLMs but not directly observed or not yet observed during evaluations. We introduce KnowSum, a statistical framework designed to provide a more comprehensive assessment by quantifying the unseen knowledge for a class of evaluation tasks. KnowSum estimates the unobserved portion by extrapolating from the appearance frequencies of observed knowledge instances. We demonstrate the effectiveness and utility of KnowSum across three critical applications: estimating total knowledge, evaluating information retrieval effectiveness, and measuring output diversity. Our experiments reveal that a substantial volume of knowledge is omitted when relying solely on observed LLM performance. Importantly, KnowSum yields significantly different comparative rankings for several common LLMs based on their internal knowledge.


Woman says ChatGPT saved her life by helping detect cancer, which doctors missed

FOX News

Fox News senior medical analyst Dr. Marc Siegel joined'Fox & Friends' to discuss the impact of artificial intelligence on medicine and his take on President Trump's decision to withdraw from the World Health Organization. A mother of two credits ChatGPT for saving her life, claiming the artificial intelligence chatbot flagged the condition leading to her cancer when doctors missed it. Lauren Bannon, who divides her time between North Carolina and the U.S. Virgin Islands, first noticed in February 2024 that she was having trouble bending her fingers in the morning and evening, as reported by Kennedy News and Media. After four months, the 40-year-old was told by doctors that she had rheumatoid arthritis, despite testing negative for the condition. WHAT IS ARTIFICIAL INTELLIGENCE (AI)?


Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis

Maharana, Umakanta, Verma, Sarthak, Agarwal, Avarna, Mruthyunjaya, Prakashini, Mahapatra, Dwarikanath, Ahmed, Sakir, Mandal, Murari

arXiv.org Artificial Intelligence

Large language models (LLMs) offer a promising pre-screening tool, improving early disease detection and providing enhanced healthcare access for underprivileged communities. The early diagnosis of various diseases continues to be a significant challenge in healthcare, primarily due to the nonspecific nature of early symptoms, the shortage of expert medical practitioners, and the need for prolonged clinical evaluations, all of which can delay treatment and adversely affect patient outcomes. With impressive accuracy in prediction across a range of diseases, LLMs have the potential to revolutionize clinical pre-screening and decision-making for various medical conditions. In this work, we study the diagnostic capability of LLMs for Rheumatoid Arthritis (RA) with real world patients data. Patient data was collected alongside diagnoses from medical experts, and the performance of LLMs was evaluated in comparison to expert diagnoses for RA disease prediction. We notice an interesting pattern in disease diagnosis and find an unexpected \textit{misalignment between prediction and explanation}. We conduct a series of multi-round analyses using different LLM agents. The best-performing model accurately predicts rheumatoid arthritis (RA) diseases approximately 95\% of the time. However, when medical experts evaluated the reasoning generated by the model, they found that nearly 68\% of the reasoning was incorrect. This study highlights a clear misalignment between LLMs high prediction accuracy and its flawed reasoning, raising important questions about relying on LLM explanations in clinical settings. \textbf{LLMs provide incorrect reasoning to arrive at the correct answer for RA disease diagnosis.}


Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare

Ahsan, Hiba, Sharma, Arnab Sen, Amir, Silvio, Bau, David, Wallace, Byron C.

arXiv.org Artificial Intelligence

We know from prior work that LLMs encode social biases, and that this manifests in clinical tasks. In this work we adopt tools from mechanistic interpretability to unveil sociodemographic representations and biases within LLMs in the context of healthcare. Specifically, we ask: Can we identify activations within LLMs that encode sociodemographic information (e.g., gender, race)? We find that gender information is highly localized in middle MLP layers and can be reliably manipulated at inference time via patching. Such interventions can surgically alter generated clinical vignettes for specific conditions, and also influence downstream clinical predictions which correlate with gender, e.g., patient risk of depression. We find that representation of patient race is somewhat more distributed, but can also be intervened upon, to a degree. To our knowledge, this is the first application of mechanistic interpretability methods to LLMs for healthcare.


Brain implants to treat epilepsy, arthritis, or even incontinence? They may be closer than you think

The Guardian

Oran Knowlson, a British teenager with a severe type of epilepsy called Lennox-Gastaut syndrome, became the first person in the world to trial a new brain implant last October, with phenomenal results – his daytime seizures were reduced by 80%. "It's had a huge impact on his life and has prevented him from having the falls and injuring himself that he was having before," says Martin Tisdall, a consultant paediatric neurosurgeon at Great Ormond Street Hospital (Gosh) in London, who implanted the device. "His mother was talking about how he's had such a improvement in his quality of life, but also in his cognition: he's more alert and more engaged." Oran's neurostimulator sits under the skull and sends constant electrical signals deep into his brain with the aim of blocking abnormal impulses that trigger seizures. The implant, called a Picostim and about the size of a mobile phone battery, is recharged via headphones and operates differently between day and night. "The device has the ability to record from the brain, to measure brain activity, and that allows us to think about ways in which we could use that information to improve the efficacy of the stimulation that the kids are getting," says Tisdall. "What we really want to do is to deliver this treatment on the NHS."


Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset

Liu, Junling, Zhou, Peilin, Hua, Yining, Chong, Dading, Tian, Zhongyu, Liu, Andrew, Wang, Helin, You, Chenyu, Guo, Zhenhua, Zhu, Lei, Li, Michael Lingzhi

arXiv.org Artificial Intelligence

Recent advancements in large language models (LLMs) have transformed the field of question answering (QA). However, evaluating LLMs in the medical field is challenging due to the lack of standardized and comprehensive datasets. To address this gap, we introduce CMExam, sourced from the Chinese National Medical Licensing Examination. CMExam consists of 60K+ multiple-choice questions for standardized and objective evaluations, as well as solution explanations for model reasoning evaluation in an open-ended manner. For in-depth analyses of LLMs, we invited medical professionals to label five additional question-wise annotations, including disease groups, clinical departments, medical disciplines, areas of competency, and question difficulty levels. Alongside the dataset, we further conducted thorough experiments with representative LLMs and QA algorithms on CMExam. The results show that GPT-4 had the best accuracy of 61.6% and a weighted F1 score of 0.617. These results highlight a great disparity when compared to human accuracy, which stood at 71.6%. For explanation tasks, while LLMs could generate relevant reasoning and demonstrate improved performance after finetuning, they fall short of a desired standard, indicating ample room for improvement. To the best of our knowledge, CMExam is the first Chinese medical exam dataset to provide comprehensive medical annotations. The experiments and findings of LLM evaluation also provide valuable insights into the challenges and potential solutions in developing Chinese medical QA systems and LLM evaluation pipelines. The dataset and relevant code are available at https://github.com/williamliujl/CMExam.